AITopics | data warehouse

2508.13949

Country:

Asia > Nepal (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Africa > Middle East > Algeria > Algiers Province > Algiers (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
(4 more...)

arXiv.org Artificial IntelligenceSep-3-2024

BEAVER: An Enterprise Benchmark for Text-to-SQL

Chen, Peter Baile, Wenz, Fabian, Zhang, Yi, Kayali, Moe, Tatbul, Nesime, Cafarella, Michael, Demiralp, Çağatay, Stonebraker, Michael

Existing text-to-SQL benchmarks have largely been constructed using publicly available tables from the web with human-generated tests containing question and SQL statement pairs. They typically show very good results and lead people to think that LLMs are effective at text-to-SQL tasks. In this paper, we apply off-the-shelf LLMs to a benchmark containing enterprise data warehouse data. In this environment, LLMs perform poorly, even when standard prompt engineering and RAG techniques are utilized. As we will show, the reasons for poor performance are largely due to three characteristics: (1) public LLMs cannot train on enterprise data warehouses because they are largely in the "dark web", (2) schemas of enterprise tables are more complex than the schemas in public data, which leads the SQL-generation task innately harder, and (3) business-oriented questions are often more complex, requiring joins over multiple tables and aggregations. As a result, we propose a new dataset BEAVER, sourced from real enterprise data warehouses together with natural language queries and their correct SQL statements which we collected from actual user history. We evaluated this dataset using recent LLMs and demonstrated their poor performance on this task. We hope this dataset will facilitate future researchers building more sophisticated text-to-SQL systems which can do better on this important class of data.

dataset, user question, varchar2, (16 more...)

2409.02038

Country:

Asia > Middle East > UAE (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.69)

Industry: Education > Educational Setting (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Rasal, Sumedh, Hauer, E. J.

Optimal Decision Making Through Scenario Simulations Using Large Language Models

arXiv.org Artificial IntelligenceJul-9-2024

The rapid evolution of Large Language Models (LLMs) has markedly expanded their application across diverse domains, transforming how complex problems are approached and solved. Initially conceived to predict subsequent words in texts, these models have transcended their original design to comprehend and respond to the underlying contexts of queries. Today, LLMs routinely perform tasks that once seemed formidable, such as writing essays, poems, stories, and even developing software code. As their capabilities continue to grow, so too do the expectations of their performance in even more sophisticated domains. Despite these advancements, LLMs still encounter significant challenges, particularly in scenarios requiring intricate decision-making, such as planning trips or choosing among multiple viable options. These tasks often demand a nuanced understanding of various outcomes and the ability to predict the consequences of different choices, which are currently outside the typical operational scope of LLMs. This paper proposes an innovative approach to bridge this capability gap. By enabling LLMs to request multiple potential options and their respective parameters from users, our system introduces a dynamic framework that integrates an optimization function within the decision-making process. This function is designed to analyze the provided options, simulate potential outcomes, and determine the most advantageous solution based on a set of predefined criteria. By harnessing this methodology, LLMs can offer tailored, optimal solutions to complex, multi-variable problems, significantly enhancing their utility and effectiveness in real-world applications. This approach not only expands the functional envelope of LLMs but also paves the way for more autonomous and intelligent systems capable of supporting sophisticated decision-making tasks.

arxiv preprint arxiv, language model, problem statement, (10 more...)

2407.06486

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Tamm, Heidi Carolina, Nikiforova, Anastasija

Towards augmented data quality management: Automation of Data Quality Rule Definition in Data Warehouses

arXiv.org Artificial IntelligenceJun-16-2024

In the contemporary data-driven landscape, ensuring data quality (DQ) is crucial for deriving actionable insights from vast data repositories. The objective of this study is to explore the potential for automating data quality management within data warehouses as data repository commonly used by large organizations. By conducting a systematic review of existing DQ tools available in the market and academic literature, the study assesses their capability to automatically detect and enforce data quality rules. The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses. Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses. The findings underscore a significant gap in the market and academic research regarding AI-augmented DQ rule detection in data warehouses. This paper advocates for further development in this area to enhance the efficiency of DQ management processes, reduce human workload, and lower costs. The study highlights the necessity of advanced tools for automated DQ rule detection, paving the way for improved practices in data quality management tailored to data warehouse environments. The study can guide organizations in selecting data quality tool that would meet their requirements most.

data warehouse, dq rule, dq tool, (13 more...)

2406.1094

Country:

Europe > Spain (0.04)
Europe > Estonia > Tartu County > Tartu (0.04)
Europe > Switzerland (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Services (1.00)
Banking & Finance (1.00)
Information Technology > Software (0.93)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.67)
Information Technology > Data Science > Data Quality > Data Cleaning (0.48)
(3 more...)

arXiv.org Artificial IntelligenceDec-12-2023

Translating Natural Language Queries to SQL Using the T5 Model

Wong, Albert, Pham, Lien, Lee, Young, Chan, Shek, Sadaya, Razel, Khmelevsky, Youry, Clement, Mathias, Cheng, Florence Wing Yau, Mahony, Joe, Ferri, Michael

This paper presents the development process of a natural language to SQL model using the T5 model as the basis. The models, developed in August 2022 for an online transaction processing system and a data warehouse, have a 73\% and 84\% exact match accuracy respectively. These models, in conjunction with other work completed in the research project, were implemented for several companies and used successfully on a daily basis. The approach used in the model development could be implemented in a similar fashion for other database environments and with a more powerful pre-trained language model.

artificial intelligence, machine learning, natural language, (16 more...)

2312.12414

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.15)
North America > Canada > British Columbia > Regional District of Central Okanagan > Kelowna (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
Asia > India > Goa > Panaji (0.04)

Genre:

Overview (1.00)
Research Report (0.87)

Industry: Energy > Power Industry (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bode, Jan, Kühl, Niklas, Kreuzberger, Dominik, Hirschl, Sebastian, Holtmann, Carsten

Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations

arXiv.org Artificial IntelligenceNov-9-2023

With the increasing importance of data and artificial intelligence, organizations strive to become more data-driven. However, current data architectures are not necessarily designed to keep up with the scale and scope of data and analytics use cases. In fact, existing architectures often fail to deliver the promised value associated with them. Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management. As the concept of data mesh is still novel, it lacks empirical insights from the field. Specifically, an understanding of the motivational factors for introducing data mesh, the associated challenges, implementation strategies, its business impact, and potential archetypes is missing. To address this gap, we conduct 15 semi-structured interviews with industry experts. Our results show, among other insights, that organizations have difficulties with the transition toward federated governance associated with the data mesh concept, the shift of responsibility for the development, provision, and maintenance of data products, and the comprehension of the overall concept. In our work, we derive multiple implementation strategies and suggest organizations introduce a cross-domain steering unit, observe the data product usage, create quick wins in the early phases, and favor small dedicated teams that prioritize data products. While we acknowledge that organizations need to apply implementation strategies according to their individual needs, we also deduct two archetypes that provide suggestions in more detail. Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.

artificial intelligence, data mesh, data mining, (17 more...)

2302.01713

Country:

Europe > Germany > Bavaria > Upper Franconia > Bayreuth (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.96)

#artificialintelligenceApr-14-2023, 12:33:58 GMT

Top 19 Skills You Need to Know in 2023 to Be a Data Scientist - KDnuggets

If you want to be a data scientist in 2023, there are several new skills you should add to your roster, as well as the slew of existing skills you should have already mastered. Part of the problem is job scope creep. Nobody knows what a data scientist is, or what one should do, least of all your future employer. So anything that has data gets stuck in the data science category for you to deal with. You're expected to know how to clean, transform, statistically analyze, visualize, communicate, and predict data.

data scientist, data warehouse, learning, (12 more...)

Country: South America > Brazil (0.04)

Industry: Education (0.49)

Technology:

Information Technology > Data Science > Data Mining (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

#artificialintelligenceApr-11-2023, 04:35:17 GMT

Trends That Will Impact Data Analytics, AI, And Cloud In 2023 - Liwaiwai

As we enter 2023, the world of analytics, AI, and cloud is entering an exciting new phase, with a wide range of innovations and developments set to reshape the landscape. Below are some trends that will have the most impact in the coming year. In 2023, as global economic uncertainty continues, enterprises with data-intensive workloads in the cloud will need to review their cloud strategies with a greater focus on cost optimization. Cloud spending will be more closely scrutinized based on the ROI and TCO of existing projects or new investments. One area where cost optimization is particularly important in the coming year is data transfer egress costs, which can make up a significant portion of an organization's cloud bill.

data warehouse, impact data analytic, specialized infrastructure and solution, (11 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Cloud Computing (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.31)

#artificialintelligenceApr-5-2023, 20:22:09 GMT

Your Data Architecture Holds the Key to Unlocking AI's Full Potential

In the words of J.R.R. Tolkien, "shortcuts make long delays." I get it, we live in an age of instant gratification, with Doordash and Grubhub meals on-demand, fast-paced social media and same-day Amazon Prime deliveries. But I've learned that in some cases, shortcuts are just not possible. Such is the case with comprehensive AI implementations; you cannot shortcut success. Operationalizing AI at scale mandates that your full suite of data–structured, unstructured and semi-structured get organized and architected in a way that makes it useable, readily accessible and secure.

data architecture hold, data lakehouse, data warehouse, (9 more...)

Industry:

Information Technology > Services (0.55)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.55)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Architecture > Real Time Systems (0.51)

#artificialintelligenceMar-31-2023, 17:40:57 GMT

ETL vs ELT: Which One is Right for Your Data Pipeline? - KDnuggets

ETL and ELT are data integration pipelines that transfer data from multiple sources to a single centralized source and perform some transformation and processing steps to it. The difference between these two is ETL transforms the data before loading, and ELT transforms the data after loading. But before diving deeply into them, let's first understand the meaning of E, L, and T. T for Transform - Transforming the data is a process of cleaning and modifying the data in a format so that it can be used for business analysis. L for Loading - It involves loading data to a target system, which may be a data warehouse or a database. ETL is the first standardized data integration method that emerged in the 1970s due to the evolution of disk storage.

elt, pipeline, storage, (14 more...)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)